这 大分野 标志着微处理器发展史上的一个根本性转折。在2001年至2009年间,CPU与GPU的性能轨迹开始分化,形成了巨大的能力鸿沟。当传统CPU遭遇 功耗墙——即提升时钟频率导致无法承受的热量——而GPU则凭借其庞大的游戏消费用户基础,转向了极致并行化的研发方向。 用户基数 来推动向极端并行架构的转型。
关键转折点
到2003年,差距开始拉大。CPU仍专注于顺序逻辑和低延迟优化,而GPU则将晶体管资源大量投入到 算术逻辑单元(ALUs)中。这使得GPU的性能从吉赫兹浮点运算(GFLOPS)跃升至 太赫兹浮点运算(Teraflops) 的吞吐量,而CPU的增长曲线则要平缓得多。
截至2009年,高端的英特尔i7-960处理器约提供70吉赫兹浮点运算性能,而英伟达GTX 280则达到了近933吉赫兹浮点运算。这不仅仅是速度的提升,更是一次计算方式的根本性重构,它优先考虑的是 吞吐量 而非单条指令的执行速度。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What primary constraint led to the 'Power Wall' for traditional CPUs?
The lack of available memory in the early 2000s.
Thermal and power limitations when increasing clock speeds.
A shortage of transistors on the silicon die.
The transition from 32-bit to 64-bit architectures.
✅ Correct!
Correct! As clock speeds increased, power consumption and heat dissipation became unmanageable for single-core serial processors.❌ Incorrect
The Power Wall refers specifically to the energy and heat limits of pushing sequential clock frequencies higher.QUESTION 2
According to the Great Divergence, which industry provided the economic engine for GPU R&D?
The Financial High-Frequency Trading market.
The Oil and Gas seismic exploration industry.
The Video Game industry.
The Cryptocurrency mining industry.
✅ Correct!
Exactly. The massive consumer installation base of gamers funded the rapid iteration of parallel graphics hardware.❌ Incorrect
While those industries use GPUs now, the video game market was the original driver of mass-market GPU evolution.QUESTION 3
By 2009, how did the peak performance of an NVIDIA GTX 280 compare to an Intel Core i7-960?
They were roughly equal in throughput.
The CPU was twice as fast as the GPU.
The GPU was nearly an order of magnitude higher (~13x).
The GPU was 100x faster than the CPU.
✅ Correct!
The GTX 280 offered ~933 GFLOPS compared to the i7's ~70 GFLOPS, representing a massive throughput delta.❌ Incorrect
Look at the 2009 snapshot: 933 GFLOPS (GPU) vs 70 GFLOPS (CPU). That's more than 10x difference.QUESTION 4
GPUs achieve higher throughput by dedicating more transistors to which component?
Large Level-3 Caches.
Complex Branch Prediction logic.
Arithmetic Logic Units (ALUs).
Instruction Decoders.
✅ Correct!
GPUs prioritize raw math execution units (ALUs) over the complex control logic used by CPUs to speed up sequential code.❌ Incorrect
CPUs use transistors for Control and Cache; GPUs use them for ALUs to perform massive parallel math.QUESTION 5
What is the correct unit for measuring one trillion floating-point operations per second?
GFLOPS.
Teraflops.
Petaflops.
Megaflops.
✅ Correct!
Tera (T) stands for trillion ($10^{12}$), while Giga (G) stands for billion ($10^{9}$).❌ Incorrect
GFLOPS are billions; Teraflops are trillions.Case Study: The 2003 Turning Point
Architectural Analysis of Figure 1.1
A hardware architect in 2003 is comparing the Pentium 4 to the GeForce FX 5800. At this moment, the performance lines on Figure 1.1 are still close together. However, within 5 years, the GPU line will accelerate exponentially while the CPU line remains linear.
Q
1. Why did the GPU trajectory become exponential while the CPU remained linear?
Solution:
GPUs are designed for data parallelism, meaning performance scales directly with the number of processing cores (ALUs) added. CPUs are limited by sequential dependencies and the complexity of managing single-thread instruction pipelines under power constraints.
GPUs are designed for data parallelism, meaning performance scales directly with the number of processing cores (ALUs) added. CPUs are limited by sequential dependencies and the complexity of managing single-thread instruction pipelines under power constraints.
Q
2. What metric on the Y-axis of Figure 1.1 defines this 'Divergence'?
Solution:
Theoretical Peak GFLOPS (Gigaflops per second), which measures the maximum possible floating-point throughput the hardware can sustain.
Theoretical Peak GFLOPS (Gigaflops per second), which measures the maximum possible floating-point throughput the hardware can sustain.